## X fixed.acidity volatile.acidity citric.acid
## Min. : 1.0 Min. : 4.60 Min. :0.1200 Min. :0.000
## 1st Qu.: 400.5 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090
## Median : 800.0 Median : 7.90 Median :0.5200 Median :0.260
## Mean : 800.0 Mean : 8.32 Mean :0.5278 Mean :0.271
## 3rd Qu.:1199.5 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420
## Max. :1599.0 Max. :15.90 Max. :1.5800 Max. :1.000
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.900 Min. :0.01200 Min. : 1.00
## 1st Qu.: 1.900 1st Qu.:0.07000 1st Qu.: 7.00
## Median : 2.200 Median :0.07900 Median :14.00
## Mean : 2.539 Mean :0.08747 Mean :15.87
## 3rd Qu.: 2.600 3rd Qu.:0.09000 3rd Qu.:21.00
## Max. :15.500 Max. :0.61100 Max. :72.00
## total.sulfur.dioxide density pH sulphates
## Min. : 6.00 Min. :0.9901 Min. :2.740 Min. :0.3300
## 1st Qu.: 22.00 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500
## Median : 38.00 Median :0.9968 Median :3.310 Median :0.6200
## Mean : 46.47 Mean :0.9967 Mean :3.311 Mean :0.6581
## 3rd Qu.: 62.00 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300
## Max. :289.00 Max. :1.0037 Max. :4.010 Max. :2.0000
## alcohol quality
## Min. : 8.40 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.20 Median :6.000
## Mean :10.42 Mean :5.636
## 3rd Qu.:11.10 3rd Qu.:6.000
## Max. :14.90 Max. :8.000
most of the wine are quality of 5 and 6 and it also show that we don’t
have enough data of all quality of wine
the sulphates histogram has outliers it shows that there are some wine
with more sulphate values
sugar value of wines lies between 1.5 and 2.5 it clearly states that most of
wine have less sugary
acidity of wine is normally distributed and there are wine with high acidity
like 16
we can see that volatile acidity is normaly distributed between .4 to .8
we can crealy see that instially citric acid are left skewed and clearly show
that the distribution is widely spreded
we clearly see that it left skewed and it has some outliers, it has quratile
range from .07 to .09
it has quartile range are ranging from 7 to 21
density has quartile range from .995 to .997 and has mean of .996, and has
max value 1.003
we can see the quartie ranging from 3.2 to 3.4
we can see that it is left skewed and it has few outiers and it ranging
from 9.5 to 11.10
we nearly having 1599 observation and of 12 variables but we don’t
have equaly distrbuted of datas of each quality
i would like to see what are factor which actually defining the quality of
wine
residual sugar got few outlier so i subset the sugar less that 8 ad i ploted to
see the
distribution of sugar
in the above plot we can clearly see that there is a correlation between is
good amount of correlation between
the above scatter plot clearly shows that citric acid causes the acidity the
outlier is also removed for the better understanding of data
the above scatter plot clearly shows that lesser ph value have more acidity
# quality vs alcohol
the bar plot shows that high quality wine have relatively high alcohol content
than lower quality wine
the above clearly states that low quality wine has vinegar taste but when we
see quality 7 and 8 there is no significant change
most of wine has same level of saltiness except 3 becuase it has a big quadrant
here we can see that if we have more sulphur which also leads to more free
sulphur but in this dataset all the point clouded in the lower point and ti
shows that most of wine are less sulfur
there is coorelation between density and alcohol hihger the density lower
the alcohol content
acutally acidity leads to pH but interstingly volatile acidity not affecting
pH value of wine
this plot show that there is diffrent variant of wine less quality with
moderate alcohol moderate quality less alcohol and more
qulaity with more alcohol
one of the interesting realtion which i found in the dataset is density
corealating with qualit of wine
we know that acidity affects pH value but the intresting relation which i
found is volatile is not affecting the pH of wine
the above multivariate plot show that quality 5 wine having high acidity
when compare to 7
we can see that there is a corelation between acidity and pH value .
i think citric acid is the reason for acidity because lower citric acid
has less citric acid and high ph value 0
the streak of length of sulphur is less in lesser quality wine shows that
lower quality wine has less in sulfur
the above grid plot clearly shows that quality 5 and 6 has more sulphur
and also shows that it has less citric acid which leads to less acidiy
the wine quality 5 and 6 has high acidity and wide range of density but
3,4,7,8 has less acidity but 8 has less acidity and density
the above plot show that lesser density has higher alcohol content
but quality 7 having lesser density abd highr alchol content
eventhough we have decent (more or less same) of amount of wine in both
quality 5 and 6 the alcohol content defininf the quality of wine
the scatter plot clearly show that there is increase in citirc acid
leads to increase in fixed acidity, the main reason i chosed the plot
is i had a doubt on volatile acidity is not making diffrence in pH value
so before that i wanna see the relationship between citric acid and
fixed acid because the citric acid actually induce fixed acid
we can see that the dot are wide spreded without any correalation and
it common that each quality of wine have wide range of volatile acid
it clearly shows that high citrus level leads to high acidity and low pH
value likewise low citrus values gives low acidity and pH value and also
gives that quality wine is mainly depend on citrus level of wine
volatile aciditu not affecting the pH of wine and volatile acid is not
the factor affectng the quality of wine, the fixed acidity is one
of the main factor decides the qulaity of wine
fixed acidity is caused because of citric acid so we citric acid regulates
the qulity of wine
i started the anaysis by seeing the summary of the data
in that i have found that there are few variable with more diffrence between
quartile range and max value, it clearly depicts that it has outliers then
i did univariate anaysis in that i found that qulaity of alcohol
is not equally distributed so it clearly states that we don’t have enough
data in each quality,it a big drawback so we can’t make any decision on
each quality of data, later in bivariate analysis i made a correlation
plot to see the relation between variable then i found the factor that
mainly infucing the quality of wine, the factor are citrus and acidity
then while ploting volatile acidity and pH i didn’t see much correlation
this bought me question that why volatile acidity not influencing the
pH of the wine then i proceeded with multivariate analysis
in that i made citrusCut as a new variable and explored the realation
between fixed acidity,pH and citrus then in final plot we concluded that
volatile acidity is not affecting the pH of wine
i though of finding a dataset where quality of wine are equally distributed
and a recent dataset to continue my future exploratory analysis